Convergence of Stochastic Iterative Dynamic Programming Algorithms

نویسندگان

  • Tommi S. Jaakkola
  • Michael I. Jordan
  • Satinder P. Singh
چکیده

Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DP-based learning algorithms to the powerful techniques of stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD("\) and Q-Iearning belong.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Algorithms for Risk-Sensitive Control

This is a survey of some reinforcement learning algorithms for risk-sensitive control on infinite horizon. Basics of the risk-sensitive control problem are recalled, notably the corresponding dynamic programming equation and the value and policy iteration methods for its solution. Basics of stochastic approximation algorithms are also sketched, in particular the ‘o.d.e.’ approach for its stabil...

متن کامل

Robust inter and intra-cell layouts design model dealing with stochastic dynamic problems

In this paper, a novel quadratic assignment-based mathematical model is developed for concurrent design of robust inter and intra-cell layouts in dynamic stochastic environments of manufacturing systems. In the proposed model, in addition to considering time value of money, the product demands are presumed to be dependent normally distributed random variables with known expectation, variance, a...

متن کامل

Differential Dynamic Programming for Solving Nonlinear Programming Problems

Dynamic programming is one of the methods which utilize special structures of large-scale mathematical programming problems. Conventional dynamic programming, however, can hardly solve mathematical programming problems with many constraints. This paper proposes differential dynamic programming algorithms for solving largescale nonlinear programming problems with many constraints and proves thei...

متن کامل

Strong convergence of modified iterative algorithm for family of asymptotically nonexpansive mappings

In this paper we introduce new modified implicit and explicit algorithms and prove strong convergence of the two algorithms to a common fixed point of a family of uniformly asymptotically regular asymptotically nonexpansive mappings in a real reflexive Banach space  with a uniformly G$hat{a}$teaux differentiable norm. Our result is applicable in $L_{p}(ell_{p})$ spaces, $1 < p

متن کامل

Application of DJ method to Ito stochastic differential equations

‎This paper develops iterative method described by [V‎. ‎Daftardar-Gejji‎, ‎H‎. ‎Jafari‎, ‎An iterative method for solving nonlinear functional equations‎, ‎J‎. ‎Math‎. ‎Anal‎. ‎Appl‎. ‎316 (2006) 753-763] to solve Ito stochastic differential equations‎. ‎The convergence of the method for Ito stochastic differential equations is assessed‎. ‎To verify efficiency of method‎, ‎some examples are ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993